4 research outputs found
Text-Guided Generation and Editing of Compositional 3D Avatars
Our goal is to create a realistic 3D facial avatar with hair and accessories
using only a text description. While this challenge has attracted significant
recent interest, existing methods either lack realism, produce unrealistic
shapes, or do not support editing, such as modifications to the hairstyle. We
argue that existing methods are limited because they employ a monolithic
modeling approach, using a single representation for the head, face, hair, and
accessories. Our observation is that the hair and face, for example, have very
different structural qualities that benefit from different representations.
Building on this insight, we generate avatars with a compositional model, in
which the head, face, and upper body are represented with traditional 3D
meshes, and the hair, clothing, and accessories with neural radiance fields
(NeRF). The model-based mesh representation provides a strong geometric prior
for the face region, improving realism while enabling editing of the person's
appearance. By using NeRFs to represent the remaining components, our method is
able to model and synthesize parts with complex geometry and appearance, such
as curly hair and fluffy scarves. Our novel system synthesizes these
high-quality compositional avatars from text descriptions. The experimental
results demonstrate that our method, Text-guided generation and Editing of
Compositional Avatars (TECA), produces avatars that are more realistic than
those of recent methods while being editable because of their compositional
nature. For example, our TECA enables the seamless transfer of compositional
features like hairstyles, scarves, and other accessories between avatars. This
capability supports applications such as virtual try-on.Comment: Home page: https://yfeng95.github.io/tec
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for
detecting, tracking, and counting fish in sonar videos. We identify sonar
videos as a rich source of data for advancing low signal-to-noise computer
vision applications and tackling domain generalization in multiple-object
tracking (MOT) and counting. In comparison to existing MOT and counting
datasets, which are largely restricted to videos of people and vehicles in
cities, CFC is sourced from a natural-world domain where targets are not easily
resolvable and appearance features cannot be easily leveraged for target
re-identification. With over half a million annotations in over 1,500 videos
sourced from seven different sonar cameras, CFC allows researchers to train MOT
and counting algorithms and evaluate generalization performance at unseen test
locations. We perform extensive baseline experiments and identify key
challenges and opportunities for advancing the state of the art in
generalization in MOT and counting.Comment: ECCV 2022. 33 pages, 12 figure
Caltech Fish Counting Dataset 2022
A full instructional guide is provided here: https://github.com/visipedia/caltech-fish-counting
The Caltech Fish Counting Dataset is a large-scale dataset for detecting, tracking, and counting fish in sonar videos